3.1 Overview of plot() and ggplot()

Exploring Visualization capabilities of R

R is equipped with multiple and extensive visualization capabilities.
The following sections will explore the capabilities of the following:

  • plot()
  • ggplot()
  • magick()

    Visualization of data comes in very handy especially when we are dealing with a LOT of data, when data changes on a periodic basis and most importantly in understanding the correlation or analyzing data as well.

Introduction to Plot() Function

Plot function in R programming language is a basic function. It can be used to visualize data in 2D format by creating graphs and charts and visualize correlation among data variables. Common graphs used in this function include scatter plots and line graphs.

The generic syntax for the Plot function is:

Plot(x,y…)

And a more complex function is:

plot(x, y, type, main, sub, xlab, ylab)

Where you can assign the type of plot you want to see by:

p”: points
l”: lines
b”: both point and lines in a single place
c”: join empty point by the lines
o”: both lines and over-plotted point
h”: histogram “s”: stair steps
n”: no plotting

xlab”: x-axis legends
ylab”: y-axis legends

Example Exercise

We have 10 students in two different courses and their grades for their recent exam.
The X variable denotes the first course and the Y variable denotes the second course.

X = 40, 15, 50, 12, 22, 29, 21, 35, 14, 15
Y = 41, 42, 32, 14, 42, 27, 13, 50, 33, 22

Put them into the correct syntax and create a line plot using type “l”

First, define X in a vector and then use the assigned variable and declare a lines plot using the plot function.

X = c(40, 15, 50, 12, 22, 29, 21, 35, 14, 15)
plot(X ,type = "l")

Next, define Y in a vector and then use the assigned variable and declare a points plot using the plot function.

Y = c(41, 42, 32, 14, 42, 27, 13, 50, 33, 22)
plot(Y ,type = "p")

There are a plethora of ways to use the basic plot() function to your advatage.
Below are a representation of other visualization capabilities within the default plot function.


Source: Kumar, 2020

3.2 Case Study 1: Visualization capabilities using plot()

Case Study 1 - using plot()

Exploring Visualization of R package developed by Dr. Guangchuang Yu from Southern Medical University

This package allows us to access the latest data and historical data of cases of all countries, plot data on a map, and create various graphs.

We can configure his data by first installing his package via Github.

Package: nCov2019
By: Dr. Guangchuang Yu (Southern Medical University)

remotes::install_github("GuangchuangYu/nCov2019")  

library(nCov2019)
get_nCov2019()
load_nCov2019()

Then we check that the packages necessary for visualization are installed properly.

require(nCov2019)
require(dplyr)

Now we can get a first impression of the dataset.

The get() function searches and calls a data object and the load() function makes sure all of the R objects saved in the file are loaded into R.

By assigning x to the function below, it triggers download of statistical data of COVID-19.
By assigning y to the function below, it triggers to load historical data of COVID-19.

x <- get_nCov2019()
y <- load_nCov2019()

We can then check the results for x and y accordingly. X informs us the total number of cases in China and Y informs us when the data was last updated.

x
  
China (total confirmed cases): 95901
last update: 2020-12-21 20:45:32
y
  
nCov2019 historical data 
last update: 2020-11-26 

We can also check worldwide statistics easily for details on confirmed cases, deaths, etc.
This function automatically sorts the entire data set by number of confirmed cases.

x['global']
    name           confirm   suspect dead    deadRate  showRate  heal
1   China          95901      7      4771    4.97      FALSE     89480
2   United States  18277433   0      324898  1.78      FALSE     10622096
3   India          10055560   0      145810  1.45      FALSE     9606111
4   Brazil         7238600    0      186764  2.58      FALSE     6409986
5   Russia         2850042    0      50723   1.78      FALSE     2273510
6   France         2529756    0      60665   2.4       FALSE     189638
7   United Kingdom 2079564    0      67718   3.26      FALSE     4380
8   Turkey         2043704    0      18351   0.9       FALSE     1834705
9   Italy          1964054    0      69214   3.52      FALSE     1281258
10  Spain          1817448    0      48926   2.69      FALSE     196958
11  Argentina      1541285    0      41813   2.71      FALSE     1368346
12  Germany         1531998   0      26655   1.74      FALSE     1129280

A Static heat map using plot()

The plot() function is very versatile and includes the capability to visualize data on a map.
Since we already assigned x to a function, we can plot them in the plot() function below to get a static heat map.

plot(x)

3.3 Case Study 2: Visualization capabilities using ggplot()

Introduction to ggplots()

Unlike the Plot function that exists by default on the R platform, you can download the ggplots2 package which will allow you to declaratively create visualization of data by providing ggplot2 with the information on how you want to map variables to aesthetics.

ggplot is a package that makes it simple to create complex plots from data in a data frame.
It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties. Therefore, we only need minimal changes if the underlying data change or if we decide to change from a bar plot to a scatterplot.
This helps in creating publication quality plots with minimal amounts of adjustments and tweaking.

Below are some examples of the capabilities of ggplot. In contrast to plot(), it is evident that the aesthetic and programmability capabilities are much more advanced.



Source: Holtz, 2020

Case Study 2 - using ggplot()

Based on the same data as Case Study 1, it is possible to extract the top 10 countries with confirmed cases and plot them on a ggplot.

# obtain top 10 country
d <- y['global'] #extract global data
d <- d[d$country != 'China',] #exclude China
n <- d %>% filter(time == time(y)) %>%
  top_n(10, cum_confirm) %>%
  arrange(desc(cum_confirm))

# plot top 10 on a graph since Feb 01 to most recent date in dataset
require(ggplot2)
require(ggrepel)
ggplot(filter(d, country %in% n$country, d$time > '2020-02-01'),
      aes(time, cum_confirm, color=country)) +
      geom_line() +
      geom_text_repel(aes(label=country),
      function(d) d[d$time == time(y),])

This results in a colorful, easy-to-see line graph like below.

This graph shows the top 10 countries with the most confirmed cases outside of China. The United States and India are the most infected countries and show exponential growth. Meanwhile, the 7 other countries under Brazil have flattened their curve and slowed down the growth with an effective containment strategy. Some other European countries such as France and Italy are also seeing hundreds and thousands of new cases.

A gauge plot - Incubation Time

Below is a gauge type plot on ggplot(.

Often times quartiles and extreme percentiles of continuous distribution are displayed with box plots. However, with insufficient information and box plots being a very ‘statistical’ diagram, gauge plots are much easier to understand and are often used as common dashboard applications.

Utilizing the 181 publicly reported confirmed cases in China, one modeling study estimated that symptoms of COVID-19 would develop in 2.5% of the infected within 2.2 days and in 97.5% of the infected within 11.5 days. The maximum of the infected shows within 14 days, which explains why the quarantine periods required in many countries are between 10 to 14 days.


Source: Kanevsky, 2020

A bar chart - Spectrum of Illness Severity

The spectrum of illness severity ranges from mild to critical and most infections are not severe. In analyzing a study involving 44,500 confirmed cases conducted by the Chinese Center for Disease Control and Prevention, the estimation of the infection severity was reported as the following: - 81% mild (no or mild pneumonia)
- 14% severe (with dyspnea (shortness of breath), hypoxia, or >50 percent lung involvement on imaging within 24-28hrs)
- 5% critical (respiratory failure, shock of multi-organ dysfunction)
- 2.3% Case Fatality Rate (CFR)

There were no deaths reported in non-critical cases.


Source: Kanevsky, 2020

A bar chart - Clinical Manifestations

The most frequent manifestation of infection is ‘pneumonia’, which can be characterized by symptoms such as fever, cough, dyspnea (shortness of breath), and bilateral infiltrates on chest imaging. An analysis of 138 pneumonia patients in Wuhan, China resulted in the most common clinical features at the onsent of illness as: - Fever 99% - Fatigue 70% - Dry cough 59% - Anorexia 40% - Myalgias (muscle aches) 35% - Dyspnea (shortness of breath) 31% - Sputum production (coughing up and spitting out the material produced in the respiratory tract) 27%


Source: Kanevsky, 2020

A bar & line chart - Case Fatality Rate (CFR) by Age Groups

The Case Fatality Rate (CFR) ranged from 5.8% in Wuhan to 0.7% across the rest of China. Most of the fatal incidents were among those of old age or preexisting conditions. The proportion of severe or fatal infections may vary by location. For example in Italy, 12% of all COVID-19 infected persons and 16% of all hospitalized patients were admitted to the ICU and in mid-March their estimated CFR was 7.2%. The median age of patients with infections in Italy were 64 years old. In comparison the CFR rate in South Korea in mid-March was 0.9% and the median age was in the 40s.


Source: Kanevsky, 2020

A bar chart with timeline - Period of Infectivity

The exact interval in which a COVID-19 patient is infectious is uncertain. Many data that backs the interval of infection are studies based on viral RNA detection from respiratory and other specimens however, detection of viral RNA does not necessarily mean that the virus is present.

Based on a series of studies, it can be assumed that viral RNA levels appear higher in the onset of the infection rather than later. Therefore, it can be assumed that the possiblity of transmitting the infection may be more likely in the earlier stage than the later stages.

The duration of viral shedding also varies and it appears that there is a wide range of possibilities depending on the severity of the illness. In one study, 90% of 21 patients with mild illness repeatedly showed negative viral RNA tests on nasal swabs 10 days after the onset of symptoms and positive tests were longer in patients with more severe illnesses. In another study, 137 patients who had recovered from COVID-19 showed the median duration of viral shedding to be 20 days.


Source: Kanevsky, 2020

Animated growth of confirmed cases

Rather than a static map, maps can also be visualized overtime in dynamic form through a Magick R Package.

Using the same variables set previously, it is possible to create a moving heat map.
The below moving heatmap is the development of COVID-19 from February 1st, 2020 to March 31st, 2020.
It is possible to see that the virus originated in China and spread across the world.

install.packages("magick")  

library(magick)
require(nCov2019)
x <- get_nCov2019()
y <- load_nCov2019()

y <- load_nCov2019()
d <- c(paste0("2020-02-", 1:29), paste0("2020-03-", 1:31))
img <- image_graph(1200, 700, res = 96)
out <- lapply(d, function(date){
  p <- plot(y, date=date,
            label=FALSE, continuous_scale = TRUE)
            print(p)
})
dev.off()
animation <- image_animate(img, fps=2)
print(animation)

List of References

Holtz, Y., 2020. Data Visualization With R And Ggplot2. [online] R-graph-gallery.com. Available at: https://www.r-graph-gallery.com/ggplot2-package.html.

Kanevsky, G., 2020. Facts About Coronavirus Disease 2019 (COVID-19) In 5 Charts Created With R And Ggplot2. [online] Novyden.blogspot.com. Available at: https://novyden.blogspot.com/2020/03/facts-about-coronavirus-disease-2019.html.

Kumar, P., 2020. Understanding Plot() Function In R. [online] JournalDev. Available at: https://www.journaldev.com/36083/plot-function-in-r.

Pedersen, T., 2020. Ggplot2 Package. [online] Rdocumentation.org. Available at: https://www.rdocumentation.org/packages/ggplot2/versions/3.3.2.

EDUCBA. 2020. Plot Function In R. [online] Available at: https://www.educba.com/plot-function-in-r/.

Qian, X., 2020. Visualize The Pandemic With R #COVID-19. [online] Medium. Available at: https://towardsdatascience.com/visualize-the-pandemic-with-r-covid-19-c3443de3b4e4.